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Apparatus for analyzing an audio signal with regard to 
rhythm information of the audio signal by using an 
autocorrelation function 

5 Cross-Reference to Related Application : 

This application is a continuation of copending 
International Application No. PCT/EP02/05171, filed May 10, 
2002, which designated the United States and was not 
published in English. 

10 

1. Field of the Invention: 

The present invention relates to signal processing concepts 
and particularly to the analysis of audio signals with 
15 regard to rhythm information . 

2 . Description of the related art: 

Over the last years, the availability of multimedia data 
20 material, such as audio or video data, has increased 
significantly. This is due to a series of technical 
factors, based particularly on the broad availability of 
the internet, of efficient computer hardware and software 
as well as efficient methods for data compression, i.e. 
25 source encoding of audio and video methods. 

The huge amount of audio visual data, that are available 
worldwide, for example on the internet, require concepts, 
which make it possible, to be able to touch, catagolize, 
30 etc. these data according to content criteria. There is a 

demand to be able to search for and find multimedia data in 
a calculated way by specifying useful criteria. 

This requires so-called "content-based" techniques, which 
35 extract so-called features from the audiovisual data, which 
represent important characteristic properties of the 
signal. Based on such features and combination of these 
features , respectively, similarity relations and common 
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features, respectively, between audio or video signals can 
be derived. This is performed by comparing and relating, 
respectively, the extracted feature values from the 
different signals, which are also simply referred to as 
5 "pieces". 

The determination and extraction, respectively, of features 
that do not only have signal-theoretical but immediate 
semantic meaning, i.e. represent properties immediately 
10 received by the listener, is of special interest. 

This enables the user to phrase search requests in a simple 
and intuitive way to find pieces from the whole existing 
data inventory of an audio signal data bank. In the same 

15 way, semantically relevant features permit to model 

similarity relationships between pieces, which come close 
to the human perception. The usage of features, which have 
semantic meaning, enables also, for example, an automatic 
proposal of pieces of interest for a user, if his 

20 preferences are known. 

In the area of music analysis, the tempo is an important 
musical parameter, which has semantic meaning. The tempo is 
usually measured in beats per minute (BPM) . The automatic 
25 extraction of the tempo as well as of the bar emphasis of 

the "beat", or generally the automatic extraction of rhythm 
information, respectively, is an example for capturing a 
semantically important feature of a piece of music. 

30 Further, there is a demand that the extraction of features, 
i.e. extracting rhythm information from an audio signal, 
can take place in a robust and computing-efficient way. 
Robustness means that it does not matter whether the piece 
has been source-encoded and decoded again, whether the 

35 piece is played via a loudspeaker and received from a 

microphone, whether it is played loud or soft, or whether 
it is played by one instrument or by a plurality of 
instruments . 
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For determining the bar emphasis and thereby also the 
tempo, i.e. for determining rhythm information, the term 
"beat tracking" has been established among the experts. It 
5 is known from the prior art to perform beat tracking based 
on note-like and transcribed, respectively, signal 
representation, i.e. in midi format. However, it is the aim 
not to need such metarepresentations, but to perform an 
analysis directly with, for example, a PCM-encoded or, 
10 generally, a digitally present audio signal. 

The expert publication "Tempo and Beat Analysis of Acoustic 
Musical Signals" by Eric D. Scheirer, J. Acoust. Soc. Am. 
103:1, (Jan 1998) pp. 588 - 601 discloses a method for 

15 automatical extraction of a rhythmical pulse from musical 
extracts. The input signal is split up in a series of sub- 
bands via a filter bank, for example in 6 sub-bands with 
transition frequencies of 200 Hz, 400 Hz, 800 Hz, 1600 Hz 
and 3200 Hz. Low pass filtering is performed for the first 

20 sub-band. High-pass filtering is performed for the last 
sub-band, bandpass filtering is described for the other 
intermediate sub-bands. Every sub-band is processed as 
follows. First, the sub-band signal is rectified. Put 
another way, the absolute value of the samples is 

25 determined. The resulting n values will then be smoothed, 
for example by averaging over an appropriate window, to 
obtain an envelope signal. For decreasing the computing 
complexity, the envelope signal can be sub-sampled. The 
envelope signals will be differentiated, i.e. sudden 

30 changes of the signal amplitude will be passed on 

preferably by the differentiating filter. The result is 
then limited to non-negative values. Every envelope signal 
will then be put in a bank of resonant filters, i.e. 
oscillators, which each comprise a filter for every tempo 

35 region, so that the filter matching the musical tempo is 
excited the most. The energy of the output signal is 
calculated for every filter as measure for matching the 
tempo of the input signal to the tempo belonging to the 
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filter- The energies for every tempo will then be summed 
over all sub-bands, wherein the largest energy sum 
characterizes the tempo supplied as a result, i.e. the 
rhythm information. Contrary to auto correlation functions, 
5 it is advantageous that the oscillator bank reacts to a 
stimulus also with output signals at double, triple, etc. 
the tempo or also at rational multiples (such as 2/3, 3/4 
of the tempo. An auto correlation function does not have 
that property, it provides only output signals at one half, 
10 one third, etc. of the tempo. 

A significant disadvantage of this method is the large 
computing and memory complexity, particularly for the 
realization of the large number of oscillators resonating 
15 in parallel, only one of which is finally chosen. This 

makes an efficient implementation, such as for real-time 
applications, almost impossible. 

The expert publication "Pulse Tracking with a Pitch 
20 Tracker" by Eric D. Scheirer, Proc. 1997 Workshop on 

Applications of Signal Processing to Audio and Acoustics, 
Mohonk, NY, Oct 1997 describes a comparison of the above- 
described oscillator concept to an alternative concept, 
which is based on the use of autocorrelation functions for 
25 the extraction of the periodicity from an audio signal, 

i.e. the rhythm information of a signal. An algorithm for 
the modulation of the human pitch perception is used for 
beat tracking. 

30 The known algorithm is illustrated in Fig. 3 as a block 
diagram. The audio signal is fed into an analysis 
filterbank 302 via the audio input 300. The analysis 
filterbank generates a number n of channels, i.e. of 
individual sub-band signals, from the audio input. Every 

35 sub-band signal contains a certain area of frequencies of 
the audio signal. The filters of the analysis filterbank 
are chosen such that they approximate the selection 
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characteristic of the human inner ear. Such an analysis 
filterbank is also referred to as gamma tone filterbank. 

The rhythm information of every sub-band is evaluated in 
5 means 304a to 304c. For every input signal, first, an 

envelope-like output signal is calculated (with regard to a 
so-called inner hair cell processing in the ear) and sub- 
sampled. From this result, an autocorrelation function 
(ACF) is calculated, to obtain the periodicity of the 
10 signal as a function of the lag. 

At the output of means 304a to 304c, an autocorrelation 
function is present for every sub-band signal, which 
represents the rhythm information of every sub-band signal. 

15 

The individual autocorrelation functions of the sub-band 
signals will then be combined in means 306 by summation, to 
obtain a sum autocorrelation function (SACF) , which 
reproduces the rhythm information of the signal at the 

20 audio input 300. This information can be output at a tempo 
output 308. High values in the sum autocorrelation show 
that a high periodicity of the note beginnings is present 
for a lag associated to a peak of the SACF. Thus, for 
example the highest value of the sum autocorrelation 

25 function is searched for within the musically useful lags. 

Musically useful lags are, for example, the tempo range 
between 60 bpm and 200 bpm. Means 306 can further be 
disposed to transform a lag time into tempo information. 
30 Thus, a peak of a lag of one second corresponds, for 
example, a tempo of 60 beats per minute. Smaller lags 
indicate higher tempos, while higher lags indicate smaller 
tempos than 60 bpm. 

35 This method has an advantage compared to the first 
mentioned method, since no oscillators have to be 
implemented with a high computing and storage effort. On 
the other hand, the concept is disadvantageous in that the 
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quality of the results depends strongly on the type of the 
audio signal. If, for example, a dominant rhythm instrument 
can be heard from an audio signal, the concept described in 
Fig. 3 will work well. If, however, the voice is dominant, 
5 which will provide no particularly clear rhythm 

information, the rhythm determination will be ambiguous. 
However, a band could be present in the audio signal, which 
merely contains rhythm information, such as a higher 
frequency band, where, for example, a Hihat of drums is 

10 positioned, or a lower frequency band, where the large drum 
of the drums is positioned on the frequency scale. Due to 
the combination of individual information, the fairly clear 
information of these particular sub-bands is superimposed 
and "diluted", respectively, by the ambiguous information 

15 of the other sub-bands. 

Another problem when using autocorrelation functions for 
extracting the periodicity of a sub-band signal is that the 
sum autocorrelation function, which is obtained by means 

20 306, is ambiguous. The sum autocorrelation function at 

output 30 6 is ambiguous in that an autocorrelation function 
peak is also generated at a plurality of a lag. This is 
understandable by the fact that the sinus component with a 
period of tO, when subjected to an autocorrelation function 

25 processing, generates, apart from the wanted maximum at tO, 
also maxima at the plurality of the lags, i.e. at 2t0, 3t0, 
etc . 

The expert publication "A Computationally Efficient 
30 Multipitch Analysis Model M by Tolonen and Karjalainen, 

IEEE Transactions on Speech and Audio Processing, Vol. 8, 
Nov 2000, discloses a computing time-efficient model for a 
periodicity analysis of complex audio signals. The 
calculating model divides the signal into two channels, 
35 into a channel below 1000 Hz and into a channel above 
1000 Hz. There from, an autocorrelation of the lower 
channel and an autocorrelation of the envelope of the upper 
channel are calculated. Finally, the two autocorrelation 
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functions will be summed. In order to eliminate the 
ambiguities of the sum autocorrelation function, the sum 
autocorrelation function is processed further, to obtain a 
so-called enhanced summary autocorrelation function 
5 (ESACF) . This post-processing of the sum autocorrelation 
function comprises a repeated subtraction of versions of 
the autocorrelation function spread with integer factors 
from the sum autocorrelation function with a subsequent 
limitation to non-negative values. 

10 

It is a disadvantage of this concept that the ambiguities 
per sub-band obtained by the auto correlation function in 
the sub-bands are only eliminated in the sum auto 
correlation function but not immediately where they occur, 
15 namely in the individual sub-bands. 

A further disadvantage of this concept is the fact that the 
auto correlation function itself does not provide any hint 
to the double, triple,... of the tempo, to which an auto 
20 correlation peak is associated. 

SUMMARY OF THE INVENTION 

It is the object of the present invention to provide an 
25 apparatus and a method for analyzing an audio signal with 
regard to rhythm information by using an auto correlation 
function, which is robust and computing-time-efficient. 

In accordance with a first aspect of the invention, this 
30 object is achieved by an apparatus for analyzing an audio 
signal with regard to rhythm information of the audio 
signal by using an autocorrelation function, comprising: 
means for dividing the audio signal into at least two sub- 
band signals; means for examining at least one sub-band 
35 signal with regard to a periodicity in the at least one 

sub-band signal by an autocorrelation function, to obtain 
rhythm raw-information for the sub-band signal, wherein a 
delay is associated to a peak of the autocorrelation 
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function; means for postprocessing the rhythm raw- 
information for the sub-band signaldetermined by the 
autocorrelation function, to obtain postprocessed rhythm 
raw-information for the sub-band signal, so that in the 
5 postprocessed rhythm raw-information an ambiguity in an 
integer plurality of a delay, to which an autocorrelation 
function peak is associated, is reduced, or a signal 
portion is added at an integer fraction of a delay, to 
which an autocorrelation function peak is associated; and 
10 means for establishing the rhythm information of the audio 
signal by using the postprocessed rhythm raw-information of 
the sub-band signal and by using another sub-band signal of 
the at least two sub-band signals. 

In accordance with a second aspect of the invention, this 
aspect is achieved by an apparatus for analyzing an audio 
signal with regard to rhythm information of the audio 
signal by using an autocorrelation function, comprising: 
means for examining the audio signal with regard to a 
periodicity in the audio signal, to obtain rhythm raw- 
information for the audio signal, wherein a delay is 
associated to a peak of the autocorrelation function; means 
for postprocessing the rhythm raw-information for the audio 
signal determined by the autocorrelation function, to 
obtain postprocessed rhythm raw-information for the audio 
signal, so that in the postprocessed rhythm raw-information 
a signal portion is added at an integer fraction of a 
delay, to which an autocorrelation function peak is 
associated; and means for establishing rhythm information 
of the audio signal by using the postprocessed rhythm raw- 
information of the audio signal. 

In accordance with a third aspect of the invention, this 
object is achieved by an apparatus for analyzing an audio 
35 signal with regard to rhythm information of the audio 

signal by using an autocorrelation function, comprising: 
means for examining the audio signal with regard to a 
periodicity in the audio signal, to obtain rhythm raw- 
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information for the audio signal, wherein a delay is 
associated to a peak of the autocorrelation function; means 
for postprocessing the rhythm raw-information for the audio 
signal determined by the autocorrelation function, to 
5 obtain postprocessed rhythm raw-information for the audio 
signal, by subtracting a version of the rhythm raw- 
information weighted by a factor unequal one and spread by 
an integer factor larger than one; and means for 
establishing the rhythm information of the audio signal by 
10 using the postprocessed rhythm raw-information of the audio 
signal . 

In accordance with a fourth aspect of the invention, this 
object is achieved by a method for analyzing an audio 

15 signal with regard to rhythm information of the audio 

signal by using an autocorrelation function, comprising: 
dividing the audio signal into at least two sub-band 
signals, examining at least one sub-band signal with regard 
to a periodicity in the at least one sub-band signal by an 

20 autocorrelation function, to obtain rhythm raw-information 
for the sub-band signal, wherein a delay is associated to a 
peak of the autocorrelation function; postprocessing the 
rhythm raw-information for the sub-band signal determined 
by the autocorrelation function, to obtain postprocessed 

25 rhythm raw-information for the sub-band signal, so that in 
the postprocessed rhythm raw-information an ambiguity in 
the integer plurality of a delay, to which an 
autocorrelation function peak is associated, is reduced, or 
a signal portion is added at an integer fraction of a 

30 delay, to which an autocorrelation function peak is 

associated; and establishing the rhythm information of the 
audio signal by using the postprocessed rhythm raw- 
information of the sub-band signal and by using a further 
sub-band signal of the at least two sub-band signals. 

35 

In accordance with a fifth aspect of the invention, this 
object is achieved by a method for analyzing an audio 
signal with regard to rhythm information of the audio 
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signal by using an autocorrelation function, comprising: 
examining the audio signal with regard to a periodicity in 
the audio signal, to obtain rhythm raw-information for the 
audio signal, wherein a delay is associated to a peak of 
5 the autocorrelation function; postprocessing the rhythm 

raw-information for the audio signal by the autocorrelation 
function, to obtain postprocessed rhythm raw-information 
for the audio signal, so that in the postprocessed rhythm 
raw-information a signal portion is added at an integer 
10 fraction of a delay, to which an autocorrelation function 

peak is associated; and establishing the rhythm information 
of the audio signal by using the postprocessed rhythm raw- 
information of the audio signal. 

In accordance to a sixth aspect of the invention, this 
aspect is achieved by a method for analyzing an audio 
signal with regard to rhythm information of the audio 
signal by using an autocorrelation function, comprising: 
examining the audio signal with regard to a periodicity in 
the audio signal, to obtain rhythm raw-information for the 
audio signal, wherein a delay is associated to a peak of 
the autocorrelation function; postprocessing the rhythm 
raw-information for the audio signal determined by the 
autocorrelation function, to obtain postprocessed rhythm 
raw-information for the audio signal, by subtracting a 
version of the rhythm raw-information weighted with a 
factor unequal one and spread by an integer factor larger 
than one; and establishing the rhythm information of the 
audio signal by using the postprocessed rhythm raw- 
information of the audio signal . 

The present invention is based on the knowledge that a 
postprocessing of an autocorrelation function can be 
performed sub-band-wise, to eliminate the ambiguities of 
35 the autocorrelation function for periodical signals, and 

tempo information, which an autocorrelation processing does 
not provide, respectively, are added to the information 
obtained by an autocorrelation function. According to an 
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aspect of the present invention, an autocorrelation 
function postprocessing of the sub-band signals is used to 
eliminate the ambiguities already "at the root", and to add 
"missing" rhythm information, respectively. 

5 

According to another aspect of the present invention, 
postprocessing of the sum autocorrelation function is 
performed, to obtain postprocessed rhythm raw-information 
for the audio signal, so that in the postprocessed rhythm 

10 raw-information a signal part is added at an integer 

fraction of a delay, to which an autocorrelation function 
peak is associated. Thereby, it is possible to generate the 
rhythm information not obtained by an autocorrelation 
function in double, triple, etc. tempi and in rational 

15 pluralities, respectively, by calculating versions of the 

autocorrelation function compressed by an integer factor or 
by a rational factor, and by adding these versions to the 
original autocorrelation function. Contrary to the prior 
art, where an expensive oscillator bank is required 

20 therefore, according to the invention, this takes place 
with weighting and addition routines, which are easy to 
implement . 

According to another aspect of the present invention, the 
25 sum autocorrelation function is further post-processed by 
subtracting a version of the rhythm raw-information to the 
autocorrelation function, which is weighted by a factor 
larger than zero and smaller than one, and spread by an 
integer factor larger than one. This has the advantage of 
30 eliminating the ACF ambiguities in the integer multiple of 
the delay, to which an autocorrelation peak is associated. 
While in the prior art no weighting of the spread versions 
of the autocorrelation function is performed prior to 
subtraction, and an elimination of the ambiguities is 
35 therefore only obtained in the theoretical optimum case, 
where the rhythm repeats itself ideally cyclically, the 
weighted subtraction provides the possibility to take 
rhythm information into account, which does not repeat 
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itself ideally cyclically, by an appropriate choice of 
weighting factors, which can, for example, take place 
empirically. 

5 According to a preferred embodiment of the present 

invention, an autocorrelation function postprocessing is 
performed, by combining the rhythm information determined 
by an autocorrelation function with compressed and/or 
spread versions of it. In the case of using the spread 
10 versions of the rhythm information, the spread versions are 
subtracted from the rhythm raw-information, while in the 
case of versions of the autocorrelation function compressed 
by integer factors, these compressed versions are added to 
the rhythm raw-information. 

15 

In a preferred embodiment of the invention, the 
compressed/spread version is weighted with a factor between 
zero and one prior to adding and subtracting. 

20 According to another preferred embodiment of the present 
invention, a quality evaluation of the rhythm information 
is performed based on the post-processed rhythm raw- 
information to obtain a significance measure, such that the 
quality evaluation is no longer influenced by 

25 autocorrelation artifacts. Thus, a secure quality 

evaluation becomes possible, whereby the robustness of 
determining rhythm information of the audio signal can be 
increased further . 

30 Alternatively, the quality evaluation can already take 
place prior to the ACF postprocessing. This has the 
advantage that, when a flat course of the rhythm raw- 
information is determined, i.e. no distinct rhythm 
information, an ACF postprocessing for the sub-band signal 

35 can be omitted, since this sub-band will anyway have no 

importance due to its hardly expressive rhythm information 
when determining rhythm information of the audio signal. In 
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this way, the computing and memory effort can be reduced 
further . 

In the individual frequency bands, i.e. the sub-bands, 
5 there are often differently favorable conditions for 

finding rhythmical periodicities. While, for example, in 
pop music often the area of the middle, such as around 
1 kHz, the signal is dominated by a voice not corresponding 
to the beat, in the higher frequency areas, often mainly 

10 percussion sounds are present, such as the hihat of the 
drums, which allow a very good extraction of rhythmical 
regularities. In other words, different frequency bands 
contain a different amount of rhythmical information, 
depending on the audio signal, and have a different quality 

15 or significance for the rhythm information of the audio 
signal , respectively. 

Therefore, according to the invention, the audio signal is 
first divided into sub-band signals. Every sub-band signal 

20 is examined with regard to its periodicity, to obtain 
rhythm raw-information for every sub-band signal. 
Thereupon, according to the present invention, an 
evaluation of the quality of the periodicity of every sub- 
band signal is performed to obtain a significance measure 

25 for every sub-band signal. A high significance measure 

indicates that clear rhythm information is present in this 
sub-band signal, while a low significance measure indicates 
that less clear rhythm information is present in this sub- 
band signal. 

30 

According to a preferred embodiment of the present 
invention, when examining a sub-band signal with regard to 
its periodicity, first, a modified envelope of the sub-band 
signal is calculated, and then an autocorrelation function 
35 of the envelope is calculated. The autocorrelation function 
of the envelope represents the rhythm raw-information. 
Clear rhythm information is present when the 
autocorrelation function shows clear maxima, while less 
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clear rhythm information is present when the 
autocorrelation function of the envelope of the sub-band 
signal has less significant signal peaks or no signal peaks 
at all. An autocorrelation function, which has clear signal 
5 peaks, will thus obtain a high significance measure, while 
an autocorrelation function, which has a relatively flat 
signal form, will obtain a low significance measure. As 
discussed above, the artefacts of the autocorrelation 
functions will be eliminated according to the invention. 

10 

The individual rhythm raw-information of the individual 
sub-band signal are not combined only "blindly", but under 
consideration of the significance measure for every sub- 
band signal to obtain the rhythm information of the audio 

15 signal. If a sub-band signal has a high significance 
measure, it is preferred when establishing the rhythm 
information, while a sub-band signal, which has a low 
significance measure, i.e., which has a low quality with 
regard to the rhythm information, is hardly or, in the 

20 extreme case, not considered at all when establishing the 
rhythm information of the audio signal. 

This can be implemented computing-time-efficiently in a 
good way by a weighting factor, which depends on the 

25 significance measure. While a sub-band signal, which has a 
good quality for the rhythm information, i.e., which has a 
high significance measure, could obtain a weighting factor 
of 1, another sub-band signal, which has a smaller 
significance measure, will obtain a weighting factor 

30 smaller than 1. In the extreme case, a sub-band signal, 
which has a totally flat autocorrelation function, will 
have a weighting factor of 0. The weighted autocorrelation 
functions, i.e. the weighted rhythm raw-information, will 
then simply be summed up. When merely one sub-band signal 

35 of all. sub-band signals supplies good rhythm information, 
while the other sub-band signals have autocorrelation 
functions with a flat signal form, this weighting can, in 
the extreme case, lead to the fact that all sub-band 
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signals apart from the one sub-band signal obtain a 
weighting factor of 0, i.e. are not considered at all when 
establishing the rhythm information, so that the rhythm 
information of the audio signal are merely established from 
5 one single sub-band signal. 

The inventive concept is advantageous in that it enables a 
robust determination of the rhythm information, since sub- 
band signals with no clear and even differing rhythm 

10 information, respectively, i.e. when the voice has a 

different rhythm than the actual beat of the piece, do no 
dilute and "corrupt" the rhythm information of the audio 
signal, respectively. Above that, very noise-like sub-band 
signals, which provide a system autocorrelation function 

15 with a totally flat signal form, will not decrease the 

signal noise ratio when determining the rhythm information. 
Exactly this would occur, however, when, as in the prior 
art, simply all autocorrelation functions of the sub-band 
signals with the same weight are summed up. 

20 

It is another advantage of the inventive method, that a 
significance measure can be determined with small 
additional computing effort, and that the evaluation of the 
rhythm raw-information with the significance measure and 
25 the following summing can be performed efficiently without 
large storage and computing-time effort, which recommends 
the inventive concept particularly also for real-time 
applications . 

30 Brief Description of the Drawings 

Preferred embodiments of the present invention will be 
discussed in more detail below with reference to the 
accompanying drawings in which: 

35 

Fig. 1 a block diagram of an apparatus for analyzing an 
audio signal with a quality evaluation of the 
rhythm raw- information; 
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Fig. 2 a block diagram of an apparatus for analyzing an 
audio signal by using weighting factors based on 
the significance measures; 

5 

Fig. 3 a block diagram of a known apparatus for analyzing 
an audio signal with regard to rhythm information; 

Fig. 4 a block diagram of an apparatus for analyzing an 
10 audio signal with regard to rhythm information by 

using an autocorrelation function with a sub-band- 
wise post-processing of the rhythm raw- 
information; and 



15 Fig. 5 a detailed block diagram of means for post- 
processing of Fig. 4. 

Detailed Description of preferred Embodiments 

20 Fig. 1 shows a block diagram of an apparatus for analyzing 
an audio signal with regard to rhythm information. The 
audio signal is fed via input 100 to means 102 for dividing 
the audio signal into at least two sub-band signals 104a 
and 104b. Every sub-band signal 104a, 104b is fed into 

25 means 106a and 106b, respectively, for examining it with 
regard to periodicities in the sub-band signal, to obtain 
rhythm raw-information 108a and 108b, respectively, for 
every sub-band signal. The rhythm raw-information will then 
be fed into means 110a, 110b for evaluating the quality of 

30 the periodicity of each of the at least two sub-band 

signals, to obtain a significance measure 112a, 112b for 
each of the at least two sub-band signals. Both the rhythm 
raw-information 108a, 108b as well as the significance 
measures 112a, 112b will be fed to means 114 for 

35 establishing the rhythm information of the audio signal. 
When establishing the rhythm information of the audio 
signal, means 114 considers significance measures 112a, 
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112b for the sub-band signals as well as the rhythm raw- 
information 108a, 108b of at least one sub-band signal. 

If means 110a for quality evaluation has, for example, 
5 determined that no particular periodicity is present in the 
sub-band signal 104a, the significance measure 112a will be 
very small, and equal to 0, respectively. In this case, 
means 114 for establishing rhythm information determines 
that the significance measure 112a is equal to 0, so that 

10 the rhythm raw-information 108a of the sub-band signal 104 
will no longer have to be considered at all when 
establishing the rhythm information of the audio signal. 
The rhythm information of the audio signal will then be 
determined only and exclusively on the basis of the rhythm 

15 raw-information 108b of the sub-band signal 104b. 

In the following, reference will be made to Fig. 2 with 
regard to a special embodiment of the apparatus of Fig. 1. 
A common analysis filterbank can be used as means 102 for 

20 dividing the audio signal, which provides a user-selectable 
number of sub-band signals on the output side. Every sub- 
band signal will then be subjected to the processing of 
means 106a, 106b and 106c, respectively, whereupon 
significance measures of every rhythm raw-information will 

25 be established by means 110a to 110c. In the preferred 

embodiment illustrated in Fig. 2, means 114 comprises means 
114a for calculating weighting factors for every sub-band 
signal based on the significance measure for this sub-band 
signal and optionally also of the other sub-band signals. 

30 Then, in means 114b, weighting of the rhythm raw- 
information 108a to 108c takes place with the weighting 
factor for this sub-band signal, whereupon then, also in 
means 114b, the weighted rhythm raw-information will be 
combined, such as summed up, to obtain the rhythm 

35 information of the audio signal at the tempo output 116. 

Thus, the inventive concept is as follows. After evaluating 
the rhythmic information of the individual bands, which 
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can, for example, take place by envelope forming, 
smoothing, differentiating, limiting to positive values and 
forming the autocorrelation functions (means 106a to 106c) , 
an evaluation of the significance and the quality, 
5 respectively, of these intermediate results takes place in 
means 110a to 110c. This is obtained with the help of an 
evaluation function, which evaluates the reliability of the 
respective individual results with a significance measure. 
A weighting factor is derived from the significance 
10 measures of all sub-band signals for every band for the 

extraction of the rhythm information. The total result of 
the rhythm extraction will then be obtained in means 114b 
by combining the bandwidth individual results under 
consideration of their respective weighting factors. 

15 

As a result, an algorithm for rhythm analysis implemented 
in such a way shows a good capacity to reliably find 
rhythmical information in a signal, even under unfavorable 
conditions. Thus, the inventive concept is distinguished by 
20 a high robustness. 

In a preferred embodiment, the rhythm raw-information 108a, 
108b, 108c, which represent the periodicity of the 
respective sub-band signal, are determined via an 

25 autocorrelation function. In this case, it is preferred to 
determine the significance measure, by dividing a maximum 
of the autocorrelation function by an average of the 
autocorrelation function, and then subtracting the value 1. 
It should be noted that every autocorrelation function 

30 always provides a local maximum at a lag of 0, which 

represents the energy of the signal. This maximum should 
not be considered, so that the quality determination is not 
corrupted. 

35 Further, the autocorrelation function should merely be 

considered in a certain tempo range, i.e. from a maximum 
lag, which corresponds to the smallest interesting tempo to 



S&ZFH020501 



- 19 - 



a minimum lag, which corresponds to the highest interesting 
tempo. A typical tempo range is between 60 bpm and 200 bpm. 

Alternatively, the relationship between the arithmetic 
5 average of the autocorrelation function in the interesting 
tempo range and the geometrical average of the 
autocorrelation function in the interesting tempo range can 
be determined as significance measure. It is known, that 
the geometrical average of the autocorrelation function and 

10 the arithmetical average of the autocorrelation function 

are egual, when all values of the autocorrelation function 
are equal, i.e. when the autocorrelation function has a 
flat signal form. In this case, the significance measure 
would have a value equal to 1, which means that the rhythm 

15 raw-information is not significant. 

In the case of a system autocorrelation function with 
strong peaks, the ratio of arithmetic average to geometric 
average would be more than 1, which means that the 

20 autocorrelation function has good rhythm information. The 
smaller the ratio between arithmetic average and 
geometrical average becomes, the flatter is the 
autocorrelation function and the lesser periodicities it 
contains, which means that the rhythm information of this 

25 sub-band signal is less significant, i.e. will have a 

lesser quality, which will be expressed in a lower and a 
weighting factor of 0, respectively. 

With regard to the weighting factors, several possibilities 
30 exist. A relative weighting is preferred, such that all 

weighting factors of all sub-band signals add up to 1, i.e. 

that the weighting factor of a band is determined as the 

significance value of this band divided by the sum of all 

significance values. In this case, a relative weighting is 
35 performed prior to the up summation of the weighted rhythm 

raw-information, to obtain the rhythm information of the 

audio signal. 
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As it has already been described, it is preferred to 
perform the evaluation of the rhythm information by using 
an autocorrelation function. This case is illustrated in 
Fig. 4. The audio signal will be fed to means 102 for 
5 dividing the audio signal into sub-band signals 104a and 
104b via the audio signal input 100. Every sub-band signal 
will then be examined in means 106a and 106b, respectively, 
as it has been explained, by using an autocorrelation 
function, to establish the periodicity of the sub-band 

10 signal. Then, the rhythm raw-information 108a, 108b is 

present at the output of means 106a, 106b, respectively. It 
will be fed into means 118a and 118b, respectively, to 
post-process the rhythm raw-information output by means 
116a via the autocorrelation function. Thereby, it is 

15 insured, among other things, that the ambiguities of the 

autocorrelation function, i.e. that signal peaks occur also 
at integer pluralities of the lags, will be eliminated sub- 
band-wise, to obtain post-processed rhythm raw-information 
120a and 120b, respectively. 

20 

This has the advantage that the ambiguities of the 
autocorrelation functions, i.e. the rhythm raw-information 
108a, 108b are already eliminated sub-band-wise, and not 
only, as in the prior art, after the summation of the 

25 individual autocorrelation functions. Above that, the 
single band-wise elimination of the ambiguities in the 
autocorrelation functions by means 118a, 118b enables that 
the rhythm raw-information of the sub-band signals can be 
handled independent of another. They can, for example, be 

30 subjected to a quality evaluation via means 110a for the 
rhythm raw-information 108a or via means 110b for the 
rhythm raw-information 108b. 

As illustrated by the dotted lines in Fig. 4, the quality 
35 evaluation can also take place with regard to post-process 
rhythm raw-information, wherein this last possibility is 
preferred, since the quality evaluation based on the post- 
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processed rhythm raw-information ensures that the quality 
of information is evaluated, which is no longer ambiguous. 

Establishing the rhythm information by means 114 will then 
5 take place based on the post-processed rhythm information 
of a channel and preferably also based on the significance 
measure for this channel. 

When a quality evaluation is performed based on a rhythm 
10 raw-information, which means the signal prior to means 
118a, this is advantageous in such, that, when it is 
determined, that the significance measure equals 0, i.e. 
that the autocorrelation function has a flat signal form, 
the post-processing via means 118a can be omitted fully to 
15 save computing-time resources. 

In the following, reference will be made to Fig. 5, to 
illustrate a more detailed construction of means 118a or. 
118b for post-processing rhythm raw-information. First, the 

20 sub-band signal, such as 104a, is fed into means 106a for 
examining the periodicity of the sub-band signal via an 
autocorrelation function, to obtain rhythm raw-information 
108a. To eliminate the ambiguities sub-band-wise, a spread 
autocorrelation function can be calculated via means 121 as 

25 in the prior art, wherein means 128 is disposed to 

calculate the spread autocorrelation function such that it 
is spread by an integer plurality of a lag. Means 122 is 
disposed in this case to subtract this spread 
autocorrelation function from the original autocorrelation 

30 function, i.e. the rhythm raw-information 108a. 

Particularly, it is preferred to calculate first an 
autocorrelation function spread to double the size and 
subtract it then from the rhythm raw-information 108a. 
Then, in the next step, an autocorrelation function spread 

35 by the factor 3 is calculated in means 121 and subtracted 
again from the result of the previous subtraction, so that 
gradually all ambiguities will be eliminated from the 
rhythm raw-information. 
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Alternatively, or additionally, means 121 can be disposed 
to calculate an autocorrelation function forged, i.e. 
spread with a factor smaller 1, by an integer factor, 
5 wherein this will be added to the rhythm raw-information by 
means 122, to also generate portions for lags tO/2, tO/3, 
etc . 

Above that, the spread and forged, respectively, version of 
10 the rhythm raw-information 108a can be weighted prior to 

adding and subtracting, respectively, to also obtain here a 
flexibility in the sense of a high robustness. 

By the method of examining the periodicity of a sub-band 
15 signal based on a autocorrelation function, a further 

improvement can be obtained, when the properties of the 
autocorrelation function are incorporated and the post- 
processing is performed by using means 118a or 118b. Thus, 
a periodic sequence of note beginnings with a distance tO 
20 does not only generate an ACF-peak at a lag tO, but also at 
2t0, 3t0, etc. This will lead to an ambiguity in the tempo 
detection, i.e. the search for a significant maximum in the 
autocorrelation function. The ambiguities can be eliminated 
when versions of the ACF spread by integer factors are 
25 subtracted sub-band-wise (weighted) from the output value. 

Above that, the compressed versions of the rhythm 
information 108a can be weighted with a factor unequal one 
prior to adding, to obtain a flexibility in the sense of 
30 high robustness here as well. 

Further, there is the problem with the autocorrelation 
function that it provides no information at tO/2, tO/3 ... 
etc., which means at the double or triple of the "base 
35 tempo", which will lead to wrong results, particularly, 
when two instruments, which lie in different sub-bands, 
define the rhythm of the signal together. This issue is 
considered by the fact that versions of the autocorrelation 
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function forged by integer factors are calculated and added 
to the rhythm raw-information either weighted or 
unweighted. 

5 Thus, ACF post-processing takes place sub-band-wise, 

wherein an autocorrelation function is calculated for at 
least one sub-band signal and this is combined with 
extended or spread versions of this function. 

10 According to another aspect of the present invention, 

first, the sum autocorrelation function of the sub-bands is 
generated, whereupon versions of the sum autocorrelation 
function compressed by integer factors are added, 
preferably weighted to eliminate the inadequacies of the 

15 autocorrelation function in the double, triple, etc. tempo. 

According to another aspect, the postprocessing of the sum 
autocorrelation function is performed to eliminate the 
ambiguities in the half, the third part, the second part, 

20 etc. of the tempo, by not just subtracting the versions of 
the sum autocorrelation function spread by integer factors, 
but weighting them prior to subtraction with a factor 
unequal one and preferably smaller than one and larger than 
zero, and to subtract them only then. Thereby, a more 

25 robust determination of the rhythm information becomes 
possible, since unweighted subtracting provides a full 
elimination of the ACF ambiguities merely for ideal 
sinusoidal signals . 

30 While this invention has been described in terms of several 
preferred embodiments, there are alterations, permutations, 
and equivalents which fall within the scope of this 
invention. It should also be noted that there are many 
alternative ways of implementing the methods and 

35 compositions of the present invention. It is therefore 
intended that the following appended claims be interpreted 
as including all such alterations, permutations, and 
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equivalents as fall within the true spirit and scope of the 
present invention. 



